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Abstract 

The problem of estimating a normal covariance matrix is consid¬ 
ered from a decision-theoretic point of view, where the dimension of 
the covariance matrix is larger than the sample size. This paper ad¬ 
dresses not only the nonsingular case but also the singular case in 
terms of the covariance matrix. Based on James and Stein’s minimax 
estimator and on an orthogonally invariant estimator, some classes of 
estimators are unifiedly defined for any possible ordering on the dimen¬ 
sion, the sample size and the rank of the covariance matrix. Unified 
dominance results on such classes are provided under a Stein-type en¬ 
tropy loss. The unified dominance results are applied to improving on 
an empirical Bayes estimator of a high-dimensional covariance matrix. 
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1 Introduction 


This paper addresses the problem of a normal covariance matrix relative to 
the Stein loss, where the dimension of the covariance is larger than the sample 
size. This problem is precisely formulated as follows: Let Xi, X 2 , ■ ■ ■, X^ be 
independently and identically distributed as Ap(Op, S). Assume that p > n 
and I] is a p X p positive definite matrix of unknown parameters. Denote 
S = Y^^=iXiX\. Then S is distributed as 


5~>Vp(n,S). 


( 1 . 1 ) 


In the p > n case, Srivastava and Khatri (1979, page 72) and Diaz-Garcia et 
al. (1997) called Wp(n, S) the pseudo Wishart distribution with n degrees 
of freedom and mean nS. We here consider the problem of estimating H 
relative to the Stein loss 


Lp((5, S)=trS ^(5 —logdet(S ^6)—p, 


( 1 . 2 ) 


where 6 stands for an estimator of S. Assume that, with probability one, 
6 is an positive definite matrix based on S. The accuracy of estimators is 
measured by the risk function Rp{d, S) = E[Lp{S, S)], where the expectation 
is taken with respect to the model fll.ip . 

If n > p, then the Wishart matrix S has the same rank p as the covari¬ 
ance matrix S with probability one. In such case, many decision-theoretic 
studies have been done for the problem of estimating S in the literature. 
James and Stein (1961) hrst discussed decision-theoretic estimation of S. 
They considered the LU decomposition of S and succeeded to derive a min¬ 
imax estimator of S relative to the Stein loss fll.2p . The James and Stein 
(1961) minimax estimator, however, depends on the coordinate system. The 
dependence results in inadmissibility of their minimax estimator. Typical im¬ 
proved estimators on James and Stein’s minimax estimator are orthogonally 
invariant estimators, which are not influenced by the coordinate system. The 
orthogonally invariant estimators have been proposed by Stein (1975, 1977). 
See also Dey and Srinivasan (1985), who gave other dominance results via 
orthogonally invariant estimators. 

In the p > n case, Kubokawa and Srivastava (2008) and Konno (2009) 
studied decision-theoretic covariance estimation relative to quadratic losses. 
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However, an analytical dominance resnlt in the p > n case with the Stein 
loss fll.2p has not been obtained as yet. 

This paper gives some dominance resnlts relative to the Stein loss fll.2l) 
in the p > n case and extends the dominance results to the case where 
S is singular. To this end, Section [2] starts with unihedly considering the 
estimation problem for any possible ordering on n, p and the rank of S. The 
singular case does not allow us to use the Stein loss fll.2p because the inverse 
of the singular S does not exist. Therefore Section [2] dehnes a Stein-like loss 
function for estimation of the singular S. We give a unihed expression of the 
James and Stein type estimator for all possible orderings on n, p and the rank 
of S. Section [2] also provides a unihed expression of orthogonally invariant 
estimators improving on the James and Stein type estimator relative to the 
Stein-like loss. 

Section [3] mainly discusses the p > n case for estimation of a nonsingular 
S relative to the usual Stein loss fll.2l) . An empirical Bayes estimator is 
derived from an inverse Wishart prior. Some improving methods on the 
empirical Bayes estimator are established by using the dominance results 
obtained in Section [21 The Monte Carlo simulations show that an improved 
estimator performs well when p is much larger than n. Moreover alternative 
estimators are unihedly constructed for both nonsingular and singular cases 
in terms of S. In Section 01 we give some remarks on our results of this 
paper and related topics. 

2 Unified dominance results on covariance es¬ 
timation 

2.1 Preliminaries 

First, we describe the problem of estimating a covariance matrix unihedly in 
the nonsingular and the singular cases. 

Assume that the p x n observation matrix X has the form 

X = BZ, (2.1) 

where B is a. p x r matrix of unknown parameters with p > r and Z is 
an r X n random matrix. Assume that B is of full column rank, namely r. 
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and r is known. Let all the colnmns of Z be independently and identically 
distribnted as Mri^rilr)- Then the colnmns of X are i.i.d. sample from 
A/J,(0p, S), where S = BB^ is a positive semi-dehnite matrix of rank r. 
Denote 

S = XX\ 

which follows Wp(n, S). In the case where r < p, Ap(0p, S) and Wp^n, S) 
represent, respectively, the singnlar mnltivariate normal and the singnlar 
Wishart distribntions. For the dehnition of the singnlar distribntions, see 
Srivastava and Khatri (1979, pages 43 and 72) and also Diaz-Garcfa et ah 
(1997). Note also that S is of rank r, while S is of rank min(n, r) with 
probability one. 


In this section, we handle only estimators which are positive semi-dehnite 
matrices of rank 

q = min(n, r) 

with probability one. Write snch estimators as 6g. Moreover, 6g are also 
assnmed to satisfy the condition that the rank of is q with probability 

one, where S’*” is the Moore-Penrose psendo-inverse of S. Since 5q and S’*" 
are positive semi-dehnite, the q nonzero eigenvalnes of S’*’<5g are positive. 
Note that trS’*’5g is eqnal to a snm of all the positive eigenvalnes of S’*’5q. 
Both nonsingnlar and singnlar cases of the Stein loss fll.2p are nnihedly de- 
hned as 

Lq{dg, S) = tr S’*’(5g - log 7r(S’*’5q) - q, (2.2) 

where 7r(S”*’(5q) stands for a prodnct of all the positive eigenvalnes of S’*’(5q. 
Then we consider the problem of estimating S relative to the Stein loss fl2.2p . 
The corresponding risk fnnction is denoted by 


Rq{6q,^) = E[Lq{Sq,i:)], (2.3) 

where the expectation is taken with respect to the model fl2.ip . 

Next, we dehne some notation. Let 0{r) be the gronp of orthogonal 
matrices of order r. For p > r, the Stiefel manifold is denoted by Vp^r = 
{A G ; A^A = Ir}. It is noted that Vr,r = 0{r). Let be a 

set of r X r diagonal matrices whose diagonal elements di,...,dr satisfy 
di > ■ ■ ■ > dr > 0. Denote by the gronp of lower triangnlar matrices 
with positive diagonal elements. 
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The Stein loss fl2.2p depends on the Moore-Penrose pseudo-inverse of S. 
Here some properties are listed for the Moore-Penrose pseudo-inverse. The 
proof of the following lemma is given in Harville (1997, Chapter 20). 

Lemma 2.1 Let B be a p x r matrix of full eolumn rank. Then the Moore- 
Penrose pseudo-inverse of B uniquely exists and has the following prop¬ 
erties: 

( 1 ) = {B^B)-^B^; 

(2) Lf + = for H G 

(3) = B ^ for a nonsingular matrix B; 

(4) {B+Y = [BY; 

(5) {BCY = {CYB+ for a q X r matrix C of full column rank. 

2.2 Constant multiple estimators 

Consider a simple class of estimators whose forms are a constant multiple of 
S. The simple class is represented by 

S^{a) = aS, (2.4) 

where a is a positive constant and q = min(n, r). This class includes the 
unbiased estimator of 



However is not the best estimator among the class fl2.4p relative to 
the Stein loss fl2.2p . Note by Lemma l2T] that = {BB^^ = {B^^B^ 
and 

S+5 = {BYB+BZZ^B^ = {BY^ZZ^B\ 
which implies that I]^(5^(a) has the same rank as ZZ^. 

Proposition 2.1 Define m = max(?7,,r) and 

1 

— • 

m 

Then = 5^(am) is the best estimator among the class fl2.4p relative to 
the Stein loss fl2.2l) . Hence for r > n, 6^^ dominates 6^^ relative to the 
Stein loss fl2.2p . 
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Proof. The nonzero eigenvalues of are identical to those of S , 
so that the nonzero eigenvalues of are identical to those of the full-rank 
matrix 

ZZ^ for n > r, 

Z^Z for n < r. 

Since the number of nonzero eigenvalues of 'S'^S is q = min(n, r) with prob¬ 
ability one, we obtain 7r(aS''^5') = a'^7r(Z’Z*), so that the risk of Sq{a) with 
respect to the Stein loss fl2.2p is expressed as 

i?q(<5^(a), S) = na trS'^S — gloga — i?[log7r(ZZ*)] — q 
= nra — gloga — i?[log7r(ZZ*)] — q. 

The risk of (a) is minimized by (am) with 

- JL- 1. 

— — 

nr m 

Thus the proof is complete. ■ 

It follows from equation (82) of James and Stein (1961) that 

9 

£^[log 7r(Z’Z’*)] E[logSi\, (2.5) 

i=l 

where Si ~ Xm-i+i- Hence has the constant risk 

<? 

= (l^ogm-'^E[logSi\. ( 2 . 6 ) 

i=l 

2.3 The James and Stein type estimator 

We next construct a James and Stein (1961) like estimator of S for any 
possible ordering on n, p and r. 

Using the same arguments as in Srivastava (2003, equation (2.2)), we can 
write the p x n random matrix W as a block matrix 

(Xn Xu 
V^21 X 22 
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where Xu is a g x g nonsingular matrix. Recall that X = BZ. Partition 
B and Z into block matrices as, respectively, 


B = 



Z = (Zi.Z2), 


where Bi and Zi are, respectively, q x r and r x q matrices. Note that 
Xii = BiZi. Since Xu is nonsingular, Bi has a full row rank. Thus, there 
exist unique elements © G and Fi G Vr,q such that Bi = ©F^. The 
decomposition B\ = Fi©* represents the QR decomposition of B\. Also, 
Bi = ©F^ is called the LQ decomposition of Bi. For the uniqueness of the 
QR decomposition, see Harville (1997, page 67). 

Take F 2 G Vr,r-q such that F = (Fi,F 2 ) G 0{r). For r < n, the LQ 
decomposition of T^Z can be written as YV^, where Y G 7)!^ and V G Vn,r- 
For r > n, F*Z is denoted by the block matrix 



where Z\ and Z 2 are, respectively, n x n and {r — n) x n matrices. The LQ 
decomposition of Zi can be written as YiV^ for Xi G 7))^ and V G 0{n), 
which gives that 


F*Z 




V*, 


where Y 2 = Z 2 V. Hence F*Z can uniquely and unifiedly be expressed as 


F*Z = YV^ for V G Vn,q and Y = 


Y2I ’ 


where Xi G and X 2 is a (r — g) x g matrix. 

Let C = BF, which is written as 

Combining fl2.7p and fl2.8p . we can uniquely decompose X as 

X = .BFF*Z = TV\ 


(27) 


(2.8) 


(2.9) 
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where 


T = CY 


&Y, 

B2TiY,+B2T2Y2 


It is then noted that ©1^i G Tg^. 

The probability distributions of nonzero elements of Y are given as fol¬ 
lows. 


Lemma 2.2 For i = 1,... ,q and j = i,... ,r, denote by yj^i the (j, i)-th 
element ofY. Then all the elements yj,i’s are mutually independent and 

vli ~ xl-i+i, yj,i ~ AA(0,1) = j =i + l,...,r). 

Proof. It is noted that T^Z ~ A/'rxn(0rxn; Ir ® In)- For the n > r and 
n <r cases, see Lemma 3.2.1 of Srivastava and Khatri (1979) and Corollary 
3.1 of Srivastava (2003), respectively. □ 

Applying fl2.9p to the Wishart matrix S = XX^ gives that 
XX* = TT*= (Tl,T*), 

where T = (T^, T^)* is p x g matrix such that Ti = ©1^i G 7^'*“ and T 2 is a 
{p — q) X q matrix. Then we consider the class of estimators, which has the 
form 

6^ = TDgT\ (2.10) 

where Dg = diag((ii, ... ,dg) and the dj’s are positive constants. 

Proposition 2.2 Let Dg^ = diag{d '(^,..., dg^), where = {n + r — 2i + 
1)“^ for i = 1,..., g. Then the best estimator among the class 02.101) relative 
to the loss 02.2p is given by Sg^ = TDg^T^, which dominates Sf^. 

Proof. Noting from Lemma 12.11 that 

C*S+C = T^B\BBYBT = T^B\BYB^BT = Ir, 


we observe 


E[tT'E-^5^] = E[tT DgT^'E'^T] = E[tT DgY^C^'E^CY] = E[tTDgY^Y]. 

( 2 . 11 ) 









In the p > r > n case, we partition Y a.sY = (Y\, Y^Y, where Y i G T^Y- 
From Lemma [2.21 it follows that = {r — n)In and E\Y\Yi] is the 

diagonal matrix of order n with the Fth diagonal element 

n n 

^^yU = {n - i + l) + {n - Y = 2n - 2i + l. 

j=i j>i 

Thus we obtain 

n 

E[tiDnY^Y] = E[tiDnY\Y^+tiDnY\Y2] = ^{n+r-2i + l)di. ( 2 . 12 ) 

i=l 

When n > r, it follows that 

r r r 

E[iiDrY^Y] = ^^E[diyl,] = ^{n + r-2i + l)di. (2.13) 

i=l j=i i=l 

Combining fl2.1ip . fl2.12l) and fl2.13p gives that 

Q 

E[trS+(5j^] = ^(n + r-2i + l)(ii. (2.14) 

i=l 

It is seen that has the same nonzero eigenvalues as DgT^'S^T, which 

implies that 

7r(S+5j^) = 7i{DgT^^+T) = 7i{DgY^Y) 

Since Y^Y is a g x g square matrix of full rank, it follows that 

7r(S+5^) = detiDgY^Y) 

= det{Dg) det(l"*r) 

= det(i:>g)7r(ZZ*). (2.15) 

Using fl2.14p and fl2.15p . we can write the risk of Sg under the loss (12.21) as 

Q 

Rg{6^, S) = ^{(n + r - 2^ + l)di - \ogdi} - E[log7r(ZZ*)] - g. 

i=l 

Hence the d^s minimizing the risk are given by = (n + r — 2i + l)“^ for 
i = l,...,g. 
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Since 6^^ belongs to the class fl2.10p . dominates <5^*^ relative to the 
Stein loss fl2.2p . In fact, <5^'^ has the constant risk 

1 

Rg{df, ^) = J2 + r - 2* + 1) - E[log7r(ZZ*)], (2.16) 

i=l 

which implies by fl2.5p and fl2.6p that 

9 

S) - Rq{^q^, S) = ^ log(n + T - 2i + 1) - ^ log 771 < 0, 

where the inequality follows from concavity of the logarithmic function. Thus 
the proof is complete. ■ 

The probability density function of T can be derived explicitly. The 
n > p = r case is obtained from, for example, Srivastava and Khatri (1979, 
Lemma 3.2.2). For the p > r case, see Srivastava (2003, Theorem 5.2) and 
Diaz-Garcia and Gonzalez-Farias (2005, Gorollary 4). 

2.4 Orthogonally invariant estimators 

Make the QR decomposition of B into TBq, where T G Vp,r and Bq G 
. We can uniquely express 5 as 5 = TB^ZBqT^. Dehne W = 
BqZZ^Bq, which is distributed as Wr(n., f2) with O = BqBq, where f2 is 
positive dehnite. The eigenvalue decomposition of W is written as RLR*, 
where L G D^, R G Vr,q and q = min(n, r). Hence we can decompose S as 

S = HLH\ 

where T G Dg and H = XR G Vp,g. 

Gonsider the class of estimators 

= 5^(5) = H^{L)H\ 

where 4*(T) = diag(0i(T),..., (j)q{L)) and the 0i(T)’s are absolutely contin¬ 
uous functions of L. The class 6^ is orthogonally invariant in the sense that 
it satishes 0(5^(5)0‘ = (5^(050*) for any 0 G 0{p). 

To evaluate the risk of 6^, we require the following lemma. 
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Lemma 2.3 Abbreviate 4>i{L) by (p^. Denote L = diag(£i, ... ,ig). Then we 
have 


E[trS+Lf$(L)Lr‘] = E 


r 1 


i=l 




j>i 


Proof. It follows from Lemma 12.11 that 

S+ = {rnrY = {nrYr+ = {ryn+r+ = 


so that 

E[trS+Lf$(L)Lf‘] = E[tr J^‘]. (2.17) 

Recall that W ~ >Vr(n, f2) and the eigenvalue decomposition of W is de¬ 
noted by RLFt. The remainder of the proof for n > r is based on the same 
arguments as in Sheena (1995, Section 2) and, for n < r, on those as in 
Kubokawa and Srivastava (2008, Lemma A.l). Their results are applied to 
the r.h.s. of fl2.17p . so we get this lemma. ■ 

Lemma 2.4 The risk of under the loss fl2.2p is written as 


R(5,^, S) = E 


1=1 


In I 1 \ I o I o ^3 1 

2^ pi" - >-1 - 1 ) 7 : + 2^ + 2 7 : 37 -- log 77 


j>i 


ii - I, 


-E\\ogTi{ZZ%-q. 

Proof. Note that 

7r(S+(5^) = 7r(ZZ') det(L-^$(L)). (2.18) 

Using fl2.18p and Lemma 12.31 gives the risk expression of this lemma. ■ 
The following proposition results from Lemma 12.41 


Proposition 2.3 Define 


Sf = HLDg^Hk 


\JS Trt 


Then dominates 6^^ relative to the Stein loss (12.2p . 
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Proof. We can prove this proposition in the same way as in Dey and 
Srinivasan (1985, Theorem 3.1). Using Lemma [2.41 and fl2.16p . we can write 
the difference in risk of 6^^ and 6'^^ as Rg{dg^, S) — , T,) = E[A ], 

where 




+ l)dt 


JS 


1 ^JSp, 

j>i 


dfi, 


- R 


o -i c'~r' (-• r c 

Hence if A <0, then Og dominates Og . It is observed that 


1 1 ■ 

V V ^3 

2^ 2^ 

i=l j>i ^ ^ 


^^di^{P,-Pg) + {di^-df)P, 

£. — £. 

i=l j>i ^ ^ 


i=l j>i i=l 


where the ineqnality is verified by the ordering properties ii > ■ ■ ■ > ig and 
df^ < ■ ■ ■ < d-^g^. Thns we obtain 

^ST 9 

A < ^(|n — r| + 1 + 2g — 2i)df^ — Q = + r — 2i + l)df^ — q = 0, 

i=l i=l 

which completes the proof. ■ 

Besides given above, many types of orthogonally invariant estimators 
are proposed for the n >p = r case. See, for example, Stein (1977), Dey and 
Srinivasan (1985), Haff (1991), Perron (1992), Sheena and Takemnra (1992) 
and Yang and Berger (1994). Their resnlts would be applicable to the cases 
when min(n,p) > r and p>r>n. 


3 Estimation of a high-dimensional covariance 
matrix 

3.1 An empirical Bayes estimator 

We here deal with the problem of estimating H in the model fll.ip relative 
to the usual Stein loss fll.2p . Note that the covariance matrix S is of rank p, 
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while the Wishart matrix S is of rank n. Using an empirical Bayes method, 
we first provide a full-rank estimator as a target which should be improved. 

Denote X = (Xi,X„), where the Xj’s are i.i.d. sample from A/^(Op, S). 
Note that X is a p x n matrix and S = XX*. Then the likelihood of S is 
proportional to 

L(S|X) oc (detS)-'^/ 2 g^p 

Assume that S has a prior density 

p(S|A) oc (det exp - ^tr S"*), A > 0. 

The resulting Bayes estimator is written as 

( 3 - 1 ) 

^ n + k n + k 

Here we estimate A from the marginal density of X, 

p(X|A) = XA*'P/2{det(XX* + 

= XA*=P/2{det(X*X + 

where X is a normalizing constant. Since det(X*X -|- A7„) = n”=i(^. + A), 
where the i/s are eigenvalues of X*X, the logarithm of the marginal density 
p(X|A) is given by 


log p(X I A) = ^ log A - ^ \og{ij + A) + log X, 


j=i 


which is used to obtain 
d 




-logp(X|A) = -A- - 2 ^ 

j = l J 


= 0 , 


namely, the maximum likelihood estimator of A is a solution of 

A kp 


7 = 1 


-I- A n + k' 
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Denote by the resulting maximum likelihood estimator of A. Substitute 
y^ML ^ f|3.ip . we get the empirical Bayes estimator 


5 


B 

P 


1 

n + k 


(S+A"'-/,). 


(3.2) 


Motivated by fl3.2p and taking account of Proposition 12.11 we dehne the 
class of estimators as 

df^{b) = ap{S + XkIp), (3.3) 


where ap = p b 
Xb (> 0) satishes 


b{S) is a differentiable bounded function of S, and 


n 


E 


Xb 

(-j + Xb 


b. 


(3.4) 


For existence of a unique solution Xb, b requires at least that 0 < 6 < n. Note 
also that Sp^{b) is of full-rank with probability one. 

To compare risk functions, we need the lower and the upper bounds of 
Xb- Note from Lemma 12. II that 


= tr = tr X(X*X)-2X* = tr (XX*)+ = tr 5+. 

i=l 


Also, note that = tr S. 


Lemma 3.1 The lower and the upper bounds of Xb are given as follows. 

b n - 6 tr S' 

---X ^ ^---. 

n — b tr n — b n 


When b <1, it particularly holds that Xb < b/{{l — 6)tr S^}. 

Proof. Let f{x\c) = c/{x + c) for a positive constant c. Since f{x\c) is 
convex in x for x > 0, it is observed that 

ri^Xb 

tr S -|- nXb 


It /■ -. /t 

A-, V''' 3-1 
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which gives the upper bound of \b. Next, let g{x) = x/{l + x) for x > 0. 
The concavity of g{x) leads to 


^ /I ^ nXbtr 


i=i 

which gives the lower bound of Xb- 
When 6 < 1, we can see that 
Xb£ 


i=i 


n + Afctr 


6 = V--^>5^ 

Z —/ 1 I \ /j-l z—/ 


Xbi 


-1 


Afotr S'”' 


1 + Xbi 


Tri+A,E-=i 


n p-l 

i=l S’ 


1 + Afetr 


which yields that Xb < h/{{l — 6)tr S~^}. 


The finiteness of the risk of 6^ (b) is verified in the following lemma. 


Lemma 3.2 Assume that there exist positive constants Bi and B 2 such that 
Bi < b < B 2 < n. Ifp — n — 1 > 0, then the risk of Sp^{b) is finite. 

Proof. A simple calculation yields that 




aAfotr S ^ ^ log(£i + Xb) - (p-n) log Xt 


i=l 


+ npOp — p log Op + log det Y, — p. 


From the given assumption, there exist positive constants Ci and C 2 such 
that Ci/n <b/{n — b) < nC 2 . Using Lemma [3.11 we observe that 

log Cl — log tr < log Xb < log C 2 + log tr S. 


The well-known inequalities 1 — x~^ < logx < x — 1 for x > 0 imply that 
UpogAft] is finite if i?[trS'’''] < 00. Under the same condition, we can verify 
the finiteness of E[Xb] and log(U -|- Ab)]. 

Note that U[tr5+] = U[tr(X*X)-i] = U[tr (Z*SZ)-i], where Z ~ 
A/pxn(0pxn) Ip ® In)j SO that 

0 < U[tr5+] < U[tr(Z*Z)-^]trS'^ 

Note also that Z*Z ~ >V„(p,/„). Thus for p — n — 1 > 0, it follows that 
U[tr (Z*Z)“^] = n{p — n — 1)”^, which completes the proof. ■ 
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3.2 Dominance results 

In the case that p = r > n, dehne the eigenvalue decomposition of S as 
S = HLH^ with H G and L = diag(£i, G Take Hq as a 

p X [p — n) matrix such that {H, Hq) G 0{p). Consider here the following 
shrinkage estimator 

= Sf^ib) - apXtHH^ 

= ap{S + XhHoHl), 

where A;, and b are dehned in fl3.3p . The rank of Sp^{b) is p with probability 
one. If p — n — 1 > 0, the risk of 5p^{b) is hnite, which is verihed in the same 
way as LemmaThe following proposition can be obtained for domination 
of dp^{b) over Sp^(b). 

Proposition 3.1 In the model (II.ip . we consider the problem of estimating 
E relative to the usual Stein loss fll.2p . Assume that there exists a positive 
constant C such that b < C < n. Let Cq = 6(n + l)/(3p — 4n — 4) for 
3p — An — 4 > 0. If b > Con/(l + Cq) and db/dii > 0, then Sp^{b) 
dominates Sp^{b) relative to the usual Stein loss (ll.2p . 

The proof of Proposition 13.11 requires suitable bounds of the logarithmic 
function log(l+x). Here we employ an upper and a lower bounds of log(l+x) 
based on the Fade approximants. For details of the Fade approximants, 
see Baker and Graves-Morris (1996). The approximants yield the following 
simple lemma, whose proof is omitted since it can easily be verihed. 


Lemma 3.3 For x > 0, it follows that 

< log(l + x) < 


x(6 + x) 


2 + x ov ■ / 2(3 + 2x) 

The upper and the lower bounds given above are concave in x. 


Proof of Proposition 13.11 Note that 

+ Ho{XJp_n)Hl} 

and also 

^p^^ib) = ap{H{L + XJnW + H,{XJp_n)Hl}. 
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The difference in risk of Sp^{b) and 6p^{b) is written by 

= £'[—OpAfttr — logdet L + logdet(T + A;,/„)] 


= E 


XtT ^ iog(i + 


2=1 


(3.6) 


Using Lemma [2751 gives that Rp{Sp^{b), S)—S) = E[A^^], where 


SH 


E 

2=1 


dX 

ap{p -n - + log(l + (3-6) 


Thus, if < 0, then Sp^{b) dominates 6p^{b). 

Differentiating both sides of (13.dh with respect to U yields that 


Ah 


db 


so that 


c^AbA 1 Ah / SAftA ^ 

) ^iij + Ah {ii + Ab)2 {ij + Ah)2 dii ’ 

dXb _ X]i=i Ah(U + Ah) ^ + X]i=i 


E 

2 = 1 




Z)r=l + ^b) 


-2 


> 0. 


(3.7) 


Let f{x) = x(6 + a;)/(6 + Ax). Using Lemma ESI we observe that 

n n 

^iog(i +v-‘)<E/(wr‘) 

Ahtr 5'^(6n + Abtr S^) 


2=1 


2=1 




2=1 


6n + 4Ahtr 


, (3.8) 


where the second inequality follows from concavity of f{x). Combining 
fl3.6ll and fl3.8p gives that 


A^^ < —ap{p — n — l)Ahtr S 


^ Abtr S''^(6n + Abtr S'"*') 


= OpAbtr X 


6n + 4Abtr 5'^ 
6n{n + 1) — (3p — 4n — 4)Abtr 
6n + 4Abtr S'^' 
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Using the lower bound of A;, given in Lemma [3.11 we can see that < 0 
if 3p — 4n — 4 > 0 and 


6n(n + 1) — (3p — 4n — 4)- 


bn 


n 


< 0 , 


namely b > Con/(l + Cq). Hence the proof is complete. ■ 

We give two examples for b. First, b is restricted to a positive constant. 
The estimator Sp^{b) can be written as 

S^/{b) = d^^ + apX,HoHl 

where is given by (12.4p . The risk of Sp^{b) can alternatively be expressed 


as 


Rp{d^/{b), S) = Rn{S^^', 5]) + Rp.n{apXbH,Hl S), 


fSC 


(3.9) 


where and Rp-n are dehned in fl2.3p . It is much hard to hnd out an optimal 
constant for b. Furthermore, the performance of Sp^{b) would worsen if b is 
too large. So we take 


&n = 


cpn 
1 + Co 


(3.10) 


The resulting estimator 6p^(bp) dominates Sp^{bp) when 3p — 4n — 4 > 0. 
Next, consider 

b, = b,{S) = {l + iJi,)bo. (3.11) 

Note that ii > in, so 6i is bounded below and above as bp <bi < 2bp. Also, 
it is observed that 

f)h 

E ^ - c) > 0- 

i=l * 

Hence it is seen from Proposition 13. ll that Sp^{bi) dominates Sp^{bi) relative 
to the usual Stein loss fll.2p for 3p — 4n — 4 > 0. 

The risk expression fl3.9p suggests further modihed estimators 

and 

5f^{b) = (5f + apXbHpHl (3.12) 

where and <5^^ are dehned in Subsections 12.31 and 12.41 respectively. Then 

the following proposition can be proved in the same way as in Subsections 

ESI and [231 
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Proposition 3.2 In the model fll.ip . we consider the problem of estimating 
S relative to the usual Stein loss (O. Under the assumptions of Proposition 
\3.1{ Sp^{b) is dominated by S^'^^{b), and moreover {b) is dominated by 
5f^{b). 

Proposition 13.11 suggests that Sp^{b) dominates 5p^{b) if they depend on 
a common large b. For a small b, it seems to hold the reverse dominance 
relation. In fact, we obtain the following proposition. 


Proposition 3.3 In the model fll.ip . we consider the problem of estimating 
S relative to the usual Stein loss m- Assume that n > 2 and p — n — 1 > 0. 
Let c* = 2(n — l)/(p — n + 1) and 


&* = c*/(l + c*). (3.13) 

If Cq <b <b^ for a positive constant Cq and db/dli < 0, then Sp^(b) 
dominates Sp^{b) relative to the usual Stein loss fll.2p . 

Proof. In the similar way to fl3.7p . it is seen that 

d^b ^ + Aft) ^ Aft ^ + Aft) ^ 

K ^ ^ E".i m +y)-= ^ ^ F"" E”.i (iVi+ 

^AfttrS"^. (3-14) 


It follows from Lemma 13.31 that 


log(l + Ab4 


2Aft£-i 
-1 


i=l 


> 


i=i 2 + Xbli 


2Af 


tr2+AftEy 


n 


J = 1 J 


2Afttr 5+ 
2 + Afttr 


Combining fl3.6p . fl3.14p and fl3.15p gives that 


(3.15) 


A'^'^ > —ap{p — n — l)Afttr — 2apAfttr + 


2Afttr 


= OpAfetr S'"*" X 


2 + Afttr 

2{n — 1) — {p — n + l)Afttr 
2 + Afttr S'’*' 
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If 6 < 6* (< 1), using the upper bound of Lemma EH] for h <1 leads to 


2(n — 1) — (p — n + llAfotr > 2(n — 1) — (p — n + 1)—^ 

1 — 0 

> 2(n - 1) - (p - n + 1)—^ = 0, 

1-0* 

which implies that > 0. Thus the proof is complete. ■ 

Assume that 6 is a small constant satisfying b < The estimator 5p^{b) 
is expressed as 

df^{b) = ap{S + XJp) = ap{H{L + XtIn)H^ + Ho{XtIp-n)Hl}, 

so the (p — n) eigenvalues among the p nonzero eigenvalues of dp^{b) are 
identically apXb = X^/p. It is seen from Lemma [3.11 that 

b n ^ b 1 

-- < Afe <-3-. 

n-b tr5+ “ 1-6 tr5+ 

Note that > tr5^ > so that < (trS'*’)”^ < Moreover 
in the large-p and small-n case, c* and 6* probably is a very small value. 
Then A^/p may become extremely small, which implies that 5p^{b) may loss 
stability and deteriorate in performance. Therefore from Proposition 13.31 
6* may be a better choice for b. See the next subsection, which gives some 
simulated values of the risk of 6p^{b^). 

We can treat the Haff (1980) type empirical Bayes estimator 

= ap{S + culp), 

where u = l/trS”^ and c is a positive constant. Some dominance results on 
Sp^{c) and 5*p^^{c) = 6p^ — cuHH^ can be derived, and the details are 
omitted. 

3.3 Monte Carlo studies 

The Monte Carlo experiments have been performed for comparing the risks 
of some estimators for some p and n. Each experiment is based on 2,000 inde¬ 
pendent replications. We have investigated estimators Sp^{b) and S^^^{b), 
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which are defined in fl3.3p and fl3.12p . respectively. It has been assnmed that 
b = bo and bi, which are given in fl3.10p and fl3.1ip . respectively. Also the 
risk of dp^{b^) has been estimated in our experiments, where 6* is given by 

dsls]). 

Note that bo, bi and 6* satisfy b{S) = b{cS) for any positive number c. 
Also, when S is transformed into cS for a positive number c, Xb satisfying 
b{S) = b{cS) becomes cXb- Hence the risks of Sp^{b) and with 

b = bo, bi and 6* are invariant under the scale transformation S ^ cS and 
S —)■ cS for any positive number c. Furthermore the risks of Sp^{b) and 
6^^'^{b) are invariant under the orthogonal transformation S —)■ PSP^ and 
S —!■ PTiP* for any P G 0{p). 

In our experiments, it has been assumed, without loss of generality, that 
S is a diagonal matrix whose diagonal elements (namely, eigenvalues) are 
larger than or equal to one. The following diagonal matrices were considered 
for an unknown covariance S which should be estimated: 

1 ) 4 ; 

2) diag(10, 10^-^/P, ..., iqi-Ip-F/p); 

3) diag(100,100^"^/^, IOO^-^/p, ..., 100^-(p-2)/p, 

In Case 1), all the eigenvalues of H are identical. In Case 2) and 3), the 
eigenvalues of S are widely scattered and the largest eigenvalue is about 
tenfold or hundredfold of the smallest eigenvalue. 

Table [1] shows some simulated risk values. In the table, the value in 
parentheses stands for estimated standard error on risk. For reference, the 
exact risk of James and Stein’s (1961) minimax estimator are 37.096 {p = 
n = 50), 72.0995 {p = n = 100) and 106.959 {p = n = 150), which can be 
computed from fl2.16p and fl2.5p of this paper. 

For large n (= p/2), S//^'^{b) provides substantial reduction in risk of 
Sp^{b), but almost not for small n (= 5). In the large-n case, d//^'^{bo) is 
slightly better than {bi) and, in the small-n case, 5//^'^{bi) is the best 
estimator among estimators considered here. 

The estimator 6p^{b^) has an undesirable performance when n = 5, and 
however it enhances the performance as n increases for each p. In Case 3) 
with large n (= p/2), Sp^{b^) has the smallest risk. 
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All the risks of estimators investigated here considerably varies with the 
change of p, n and S. For example, the risks of (5p'®(6o) and 6p^{h^) have 
very different behavior with increasing n. Our numerical results suggest 
that, although an optimal selection of b would involve difficulty in practical 
application, we could recommend if p is much larger than n. 


3.4 A unified dominance result including both nonsin¬ 
gular and singular cases 

In Subsection 13.21 we provided some dominance results for p = r > n. The 
dominance results can be extended to all possible cases of orderings on n, p 
and r in the model fl 2 .ip . 

Note that the possible orderings on n, p and r are expressed by either 
min(n,p) > r or p > r > n. Let q = min(? 7 ,,r) and m = max(n, r). The 
eigenvalue decomposition of S is written as HLH^, where H G Vp,g and 
L = diag(f'i,..., £ 5 ) G Dg. Take Hq G Vp^p-q such that G 0{p). 

Let Afe be a unique solution of the equation 


E 


A 

ii X 


h. 


where 6 is a differentiable function of S and satishes 0 < 6 < g. The estima¬ 
tors df^{b) and S^^{b) are dehned by, respectively. 


6f^{b) 


an{S + XbHH^) 
“1“ A^Tp), 


for min(?7,,p) > r, 
for p > r > n, 


(3.16) 


r, 
(3.17) 

where dm = 

In the min(n,p) > r and the p = r > n cases, the dehnition fl3.16p and 
(13.17p imply that 6^^{b) and S^^{b) have the same rank as S. However, 
in the p > r > n case, Sf^{b) and S^^{b) are of rank p, while S'^(5f'®(6) 
and S’^5f^(6) are of rank r. This is verihed as follows: When p > r > n, 


5f^(6) = S^^ib) - dmXbHH^ = 


tEB 


dnS for min(n,p) ; 

r{S + XbHoHi), for p > r > n, 
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recall that H = XR where T G Vp^r and R E Vr,n; which are dehned in the 
beginning of Subsection 12.41 Take Tq G Vp,p-r and Rq G Vr,r-n such that 
(T, To) G 0(p) and (R,Ro) G 0(r). Dehne = Ti^o^^oT‘ + YoT*. 

Then it is seen that 

+ HoJ^o = r(RR^ + J^o^^o)Y' + ToT[, = YT* + YqY^, = Ip 

and 

Y t JLT ut 'Y' _ rt Tft 

^ ~ -Ko-Kq. 

Since = YO~^Y*, where Y G Vp^r and O is r x r positive dehnite, it is 
observed that 


Y*5f^(6)Y = ar{R{L + \an)R' + Abi^oi^o), 

Y*5f^(6)Y = ar{RLR^ + KRoRi), 

so that S^^{b) and Sf^{b) are of rank p, while S'^(5f^(6) and 'S'^Sf^{b) are 
of rank r. In such p > r > n case, XRqRqX* is not observable. Thus it is 
hard to hnd an estimator <5 satisfying that both 6 and S’*'(5 are of rank r. 

The difference in risk of S^^{b) and 6^^{b) with respect to the Stein loss 
(El) can be written as 

Rr{8^,^{b), S) - Rr{6f^{b), S) = ^[-a^Afetr S+ifif* + logdet(/g + AfeT”^)] 

for both the min(n,p) > r and the p > r > n cases. Hence the same 
arguments as in the proof of Proposition [TT] lead to the following proposition. 


Proposition 3.4 In the model fl2.ip . we consider the problem of estimating 
S relative to the Stein loss (12.2p . Let Co = 6{q + l)/(3m - 4g - 4) for 
3m — 4g — 4 > 0. Assume that cog/(l + cq) < b < C < q for a positive 
constant C and Y^l=idb/dli > 0. Then 5f^{b) dominates 8f^{b) for any 
possible ordering on n, p and r. 

Further improvements on S^^{b) can be established in the same way as in 
Subsections 12.31 and 12.41 Also, we can derive the reverse dominance relation 
such that Sf^{b) dominates S^^{b) as in Proposition 13.31 
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4 Some remarks 


This paper addresses the problem of estimating a high-dimensional covari¬ 
ance matrix of multivariate normal distribution and also discusses a unified 
extension to all possible cases of orderings on the dimension, the sample size 
and the rank of the covariance matrix. We conclude this paper with giving 
some remarks. 

In this paper, it is assumed that S has a known rank r in the singular 
model fl2.ip . When min(?7,,p) > r or p = r > n, the observation matrix 
X is of rank r with probability one and inherits the rank from the singular 
covariance matrix S. Thus, even if r is unknown, a value of r is evaluable 
from X as long as min(n,p) > r or p = r > n. However the p > r > n case 
with unknown r does not permit the evaluation of r, which deeply affects the 
accuracy of estimators, particularly when r is much smaller than p. 

Instead of the Stein loss fll.2p . we may employ the quadratic loss 

Lq{S, S) = tr S-i((5 - - S). (4.1) 

Selliah (1964) treated the n> p = r case of covariance estimation under fl4.1l) 
and obtained an improved estimator based on the LU decomposition of S. 
For other approaches, see Haff (1979, 1980, 1991), Yang and Berger (1994) 
and Tsukuma (2014). See also Konno (2009), who discussed the p = r > n 
case under the quadratic loss fl4.ip . For the singular case, the quadratic loss 
fl4.ll) probably should be replaced by 

L*q{5, S) = tr S+(5 - S)S+(5 - S). 

Indeed, we can easily obtain an improved estimator similar to Selliah (1964) 
via the same way as in Subsection 12.31 but the details are omitted here. 

The observation matrix X has the form X = BZ, where B is an un¬ 
known matrix of parameters and Z is a random matrix. The dominance 
results of Section [2] can be extended to the estimation problem of a scale 
matrix in an elliptical distribution model, where the p.d.f. of Z has the form 
f{ti ZZ^) for an integrable function /. The n> p = r case with the usual 
Stein loss fll.21) has been studied by Kubokawa and Srivastava (1999). Their 
dominance results can be extended to our singular case. 
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Table 1; Simulated risk with respect to the usual Stein loss. 



P 

n 


C(feo) 


C(&i) 

<5^"(6*) 

1) 

50 

5 

28.6(0.07) 

28.5(0.08) 

18.4(0.08) 

18.2(0.09) 

113.4(0.10) 



15 

5.0(0.01) 

2.7(0.02) 

4.8(0.01) 

2.0(0.02) 

86.2(0.06) 



25 

24.3(0.08) 

8.7(0.02) 

26.5(0.10) 

9.6(0.03) 

67.3(0.06) 


100 

5 

115.8(0.13) 

115.8(0.13) 

82.4(0.16) 

82.3(0.16) 

301.1(0.14) 



25 

13.3 (0.02) 

10.8(0.03) 

10.4(0.02) 

7.3(0.03) 

228.7(0.07) 



50 

41.2(0.08) 

14.1(0.02) 

44.3 (0.09) 

15.4(0.02) 

165.3(0.06) 


150 

5 

230.7(0.16) 

230.7(0.16) 

170.9(0.22) 

170.9(0.22) 

516.7(0.18) 



40 

18.0 (0.02) 

13.7(0.03) 

14.9 (0.01) 

9.6(0.03) 

377.5 (0.06) 



75 

59.0(0.07) 

19.9(0.02) 

63.1 (0.08) 

21.5(0.02) 

276.1(0.06) 

2) 

50 

5 

26.0(0.07) 

25.9(0.08) 

19.1(0.07) 

18.9(0.08) 

105.6(0.12) 



15 

13.3(0.02) 

11.0(0.01) 

14.4(0.03) 

11.6(0.02) 

81.7(0.07) 



25 

44.2(0.13) 

27.6(0.06) 

46.3(0.14) 

28.9(0.07) 

65.1(0.06) 


100 

5 

103.0(0.14) 

102.9(0.14) 

75.4(0.17) 

75.3(0.17) 

282.9(0.17) 



25 

23.7(0.01) 

21.3(0.01) 

23.7(0.01) 

20.8(0.01) 

217.6(0.08) 



50 

76.7(0.12) 

48.2(0.06) 

79.5(0.13) 

49.9(0.06) 

160.5(0.06) 


150 

5 

207.4(0.18) 

207.4(0.18) 

155.4(0.23) 

155.4(0.23) 

488.0(0.21) 



40 

35.6 (0.01) 

31.3(0.01) 

36.1 (0.01) 

31.1(0.01) 

361.2(0.08) 



75 

110.7(0.11) 

69.6(0.06) 

114.5(0.12) 

71.9(0.06) 

268.6(0.06) 

3) 

50 

5 

35.7(0.02) 

35.6(0.02) 

38.4(0.08) 

38.2(0.07) 

87.1(0.14) 



15 

61.9(0.17) 

59.2(0.16) 

65.1(0.20) 

62.2(0.18) 

70.8(0.09) 



25 

125.6(0.33) 

105.7(0.27) 

127.5(0.34) 

107.2 (0.28) 

59.8(0.07) 


100 

5 

87.4(0.11) 

87.4(0.11) 

77.2 (0.08) 

77.2 (0.08) 

237.3(0.21) 



25 

99.4(0.12) 

96.7(0.12) 

104.7(0.15) 

101.8(0.14) 

188.8(0.10) 



50 

219.5(0.31) 

186.2(0.25) 

221.8(0.32) 

188.1(0.26) 

148.2 (0.07) 


150 

5 

165.1(0.18) 

165.1(0.18) 

135.9(0.17) 

135.9(0.17) 

415.0(0.26) 



40 

154.4(0.13) 

149.7(0.12) 

161.3(0.16) 

156.1(0.14) 

318.2(0.10) 



75 

317.2(0.31) 

269.7(0.25) 

320.3(0.31) 

272.2(0.26) 

249.2 (0.07) 





